Local extrema of entropy functions under tensor products 
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We show that under a certain condition of local commutativity the minimum von-Neumann 
entropy output of a quantum channel is locally additive . We also show that local minima of the 
2-norm entropy functions are closed under tensor products if one of the subspaces has dimension 2. 



Let K be a subspace of the m x n complex matrices, 
and let x € K, Tr[xx*] = 1. Then the von Neumann 
entropy of x is 

H(x) := — Tr[xx* lnxx*], 
and the minimum entropy output of the subspace K is 



min H(xx*). 

x€K,Tc[xx*]=l 



Recently, Hastings [H disproved the famous additivity 
conjecture, which posited that 



(1) 



This conjecture was considered one of the most significant 
open problems in quantum information theory, spawning 
a large literature |2|. Its importance was motivated in 
part by the problem of finding the classical capacity of a 
quantum channel, and in part by a result of Peter Shor [3[ 
that showed that a number of apparently distinct additiv- 
ity conjectures, including the additivity of the minimum 
entropy output of a quantum channel, the additivity of 
the entanglement of formation, and the additivity of the 
Holcvo capacity, were all equivalent. 

Hastings' counterexample showed that the von Neu- 
mann entropy function is not globally additive on sub- 
spaces: in other words, if x\ is a global minimum in K\ 
and X2 is a global minimum in K 2 , then x\® x 2 is not 
necessarily a global minimum in K i ® K 2 ■ On the other 
hand, in this paper we show that under certain condi- 
tions the von Neumann entropy is locally additive. More 
precisely, we show that if Ki is a subspace with a lo- 
cal minimum x,, and XjX* commutes with Xiy* for every 
Hi 6 Ki, then xi ® x 2 is a local minimum of K\ £g) K 2 ; 
we call this condition the local commutativity condition. 
More generally, we study the behaviour of entropy func- 
tions of the eigenvalues of xx*, and we consider when 



the tensor product of two local minima is again a local 
minimum. 

The paper is organized as follows. In Section U we an- 
alyze the local commutativity condition. In Section lU 
we consider the first derivative of the entropy function 
and note that critical points of the von Neumann and 
Renyi entropies are closed under tensor products. These 
results are due to a group participating in the American 
Institute for Mathematics workshop on "Geometry and 
representation theory" Q. In Section Hill we consider the 
second derivative of the von Neumann entropy function, 
and show that local minima of von Neumann entropy are 
closed under tensor products, given the previously men- 
tioned commutativity assumption. Finally, in Section lPVl 
we consider the second derivative of the 2-norm entropy 
function. We show that local minima of the 2-norm are 
closed under tensor products if one of the subspaces has 
dimension 2. In the Appendix A we analyze the affine 
parametrization and use it to derive a necessary condi- 
tion for local minima. In Appendix B we show that there 
is a simple counter example for the additivity conjecture 
over the real numbers. 



I. THE LOCAL COMMUTATIVITY 
CONDITION 

For a given function / : [0, oo) —¥ (— oo, oo) we define 
/(«) = Y,?=if(^( xx *)) for x G C mxn , and A 4 are the 
eigenvalues of xx*. We assume that either / is smooth 
on [0, oo), i.e. has two continuous derivatives at every 
t > 0, or /(*) = H(t) = -tlogt. Let D y f(x),D 2 y f(y) 
denote the first and the second derivative of / in the y 
direction: 



Dyf(x) = jj{x + ey) 
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Then x is a critical point if and only if D y f(x) = for 
each y € K (in the next section we will discuss in more 
details this condition). 

Here we focus on the function f(t) — H(t) = — ilogi. 
In this case we need to be very careful when dealing with 
xx* which have zero eigenvalues. We will see that for any 
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x,y £ C mxn , D y f(x) £ K. However it is possible that 
Dyf = oo, and below we give the exact conditions on y 
when this happens. Hence if a; is a critical point of the 
von Neumann entropy, H(x), and D~.H(x) = oo then 
H(x + ey) > H(x) for small enough e. Thus when we 
study in the next sections the local minimum of H(K\ ® 
K2) at the critical point x% ® x 2 we need only to consider 
yi such that Dy f < 00 for % = 1, 2. This will also give a 
partial explanation of the local commutativity condition 
discussed in the introduction. 

Lemma 1. Let x,y £ C mxn , Trxx* > and Hit) = 
—tlogt. Then D y H(x) £ K. Change standard orthonor- 
mal bases in C m ,C™ to new orthonormal bases such that 
x, y have the forms 



x\\ O rn ^ r 

rn — r.n—r 



O771— r,r 



and y 



Vii 2/12 
J/21 2/22 



. (2) 



( r ? 1 ) matrix whose entries are (r + 1) x (r + 1) minors of 
x + ey.) Note that A r+1 (a; + ey) is a polynomial matrix in 
e. Since x has rank r it follows that A r+1 (x) = 0. Hence 
A r+1 (x + ey) = ez\ + e 2 z 2 {e)^ where Z\ is a constant 
matrix and z 2 (e) is a polynomial matrix in e. We claim 
that Z\ = if and only if j/22 = 0. Indeed since D is 
diagonal then a minor of order r + 1 that can have a 
nonzero derivative at e = is the minor based on the 
rows a = {1, ...,r,p} and columns j3 = {1, . . . , r, q}. 
Denote this minor by det(x + ey)[a,(3]. Clearly det(a; + 
ey)[a,(3] = e{d\ . ..d r y Ptq ) + 0(e 2 ), where y PtQ is the p,q 
entry of y. So if y22 = we obtain that Z\ = 0. Hence 
|| A r+1 (x + ey)\\ 2 = ai(A r+1 (x + ey)) < e 2 a for some 
positive a. Recall that 

r+l 

(ci(A r+1 (£ + ey))) 2 = J] Xi((x + ey)(x + ey)*). 



with Oij € C xj and x n ,y n £ C rxr . Then D 2 y f(x) = 00 
if and only if yii 7^ . 

Proof. By considering UKV, where U, V unitary we may 
assume that x, y in the form Furthermore x\\ — D = 
diag(di, . . . , d r ), where d\ > d 2 > . . . > d r > and r is 
the rank of x. So a\ is the i-th singular value, Oi(x\\) for 
i = 1, . . . , r. Observe next that 



Tr((.x + ey)(x + ey)*) = ]T Xi((x + ey)(x + ey)*). (3) 

z=l 

We assume here that the eigenvalues of a hermitian ma- 
trix are arranged in a nonincreasing order. Note that 

(x + ey)(x + ey)* = xx* + e(xy* + yx*) + e 2 (yy*) 

Hence 

Xi((x + ey)(x + ey)*) = X^xx* + e(xy* + yx*)) + 0(e 2 ). 
Observe next that 



xx* + e(xy* + yx*) 



D + e(Dy* n +y n D) eDy* 21 
ey 2 iD 



For small e, the first variation formula (see |5j|) yields 

X l ((x + ey)(x + ey)*) = d. t + d[e + 0(e 2 ) for % = 1, . . . , r, 
Xi{{x + ey)(x + ey)*) = 0{e 2 ) for i > r. 

Hence Xi((x + ey)(x + ey)*) = d"e 2 + 0(e 3 ) for i > r, with 
d'r+i > • • • > d" n > 0. These calculations show that 

m 

H(x + ey)=H(D + e yil )- £ d'>e 2 log«e 2 ) + 0(e 2 ). 



Hence D y H(x) £ R and D 2 H(x) = 00 if and only if 
d" +l > 0. It is left to show that d" +1 > if and only 
if y22 7^ 0. Consider A r+1 (a; + ey). the r + l compound 
matrix of x + ey. (Recall that A r+1 (a; + ey) is the (^J x 



As (cri(A r+1 (C + ey))) 2 < a 2 e 4 , we deduce that = 0. 

It is left to show that if y VA 7^ for some p,q> r, then 
< +1 > 0. Clearly, 



||A r+1 (x + ey)|| 2 > \det(x + ey)[a,(3}\ > d x . . . d r \y p , q \ U 

for some small value of e. (The first inequality follows 
from the fact the £2 norm of a matrix is not less than 
the absolute value of any of its entries.) This shows that 

4' + i > 0. □ 
The lemma above implies that for the purpose of cal- 
culating local minima, without loss of generality, we can 
always take the directional derivatives in a direction with 
y22 = 0. In the lemma above, however, we did not im- 
pose the normalization condition Ti~(xx*) = 1. As we 
show in the next lemma, it does not affect the result that 
D 2 H = 00 if and only if y 2 2 = 0. 



Lemma 2. Let x,y £ C m 

y 7^ 0. Consider the matrix 

x{y,e) 



1 



with Tr(xx*) = 1 and 
(x + ey) , 



v / Tr((a; + ey)(a; + ey)*) 

which is always defined for small \e\. Then 
f e H(x(y,e))\ t=0 £ R, and ^jH(x(y,e))\ e=a = 00 if and 
only if D 2 y {f) = 00. 

Proof. The functions h x (e) := (Tr((x + ey)(x + ey)*))- 1 
and /12(e) := logTr((o; + ey)(x + ey)*) are analytic in the 
neighborhood of e = 0, and clearly 



H(x(y,e)) = h 1 (e)f(e) + h 2 (e). 
As hi(0) = 1 we obtain 
d 



(4) 



de 



H(x(y, e))U=o = D v (f) + h[(0)H(x) + h' 2 (0) £ 



while ^H(x(y,e))\ e —o consists of D 2 (f) plus finite 
terms. The lemma follows. □ 
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The two lemmas above imply the following character- 
ization of the local commutative condition discussed in 
the introduction. 

Lemma 3. Let the assumptions of Lemma[l] hold. As- 
sume that x,y are in the form ([2]). Then xx* commutes 
with xy* if and only ifyii = and inijj commutes with 
x ny*l (which is equivalent to x^xny^ = y*\X\\x*iJ 

Proof. Write x and y as in @. The assumption that 
X\x is invertible, and xx* commutes with xy* is equiv- 
alent to ?/2i = and inijj commutes with xwy*^. So 
XnX^xny^ — xny^xnx^. Divide both sides of this 
equalities by Xn to obtain the lemma. □ 
In particular, the above lemma together with the the- 
orem in Section IIIII imply that local additivity holds for 
subspaces consisting of matrices y as in Eq. ([5]), with yn 
diagonal, y 2 i = 0, and yi 2 arbitrary. 

II. FIRST DERIVATIVE OF ENTROPY 
FUNCTIONS UNDER TENSOR PRODUCTS 

All of the results in this section are due to the "Quan- 
tum Information Group" participating in the work- 
shop "Geometry and representation theory" , held at the 
American Institute for Mathematics ^4|; we record the 
results here for completeness. 

For a given function fit) as defined above, let D y f{x) 
denote the derivative of / in the y direction: 

D y f(x) = ±f(x + ey)\ £=0 . 

Then x is a critical point if and only if D y f{x) = for 
every y. Since we are interested in local minima in K 
subject to Tr[xa;*] = 1, we restrict y to the tangent space 
{y £ K : D y Tv[xx*} = 0} = {y G K : Tr[a;y* +yx*} = 0}. 
Also, we restrict our attention to functions f(x) which 
depend only on xx* . Since xx* is invariant under x 1— > ix, 
we may ignore y = ix. That is, x £ K is critical if and 
only if Dyf(x) — for every y in the orthogonal subspace 

x x := {y G K : Tr[xy*] = 0}. 

Under tensor products, the orthogonal subspace has 
the following decomposition: 

(X 1 <E)X 2 ) ± = (xi)®X2 © Xi © (x 2 ) © X^®X 2 . 

For a function f{x) depending only on xx* , a point 
x G K is critical in K if and only if D y f{x) = for 
every y G x . In general, given a univariate differentiable 
function F, a Taylor series expansion of F shows that the 
matrix function a 1— > Tr[F(a)] has directional derivative 

±Tr[F(a + eb)]\ e=0 =Tr[F'(a)b]. 

We are interested in the case a = xx* and b = xy* +yx*: 
ii f{x) = Tr[F{xx*)], then 

D y f(x) = Tr[F'(xx*)(xy*+yx*)]. 



This derivative is zero for all y G x 1 - if and only if 
Tr[F' \xx*)xy*] = for all y £ or 1 . 

Theorem 1. Let F be a differentiable univariate func- 
tion such that F'iai <8> 02) is in the span of 

{F'(ai) <g> F'(a 2 ), F'(ai) ® J, I ® F'(a 2 ), / ® I}. 

If X\ and X2 are critical points of fix) = Tr[Fixx*)] 
subject to Tr[xx*] = 1, then so is X\ <g> x 2 - 

Proof. Let x — x\ ® x 2 . It suffices to show that if 
D Vi f(xi) = for all yi £ x^-, then D y f(x) = for 
all y G x^. That is, if Tr[F' (xiX*)xiy*] = 0, then 
Tr[F'ixx*)xy*} = 0. 

First, suppose y = y\ ® y 2 , for some arbitrary y\ 
and ?/2, and consider the term in Fixx*) proportional 
to F'{xxx\) ® F'ix2X*,): we have 

Ti[(F'(xixl) (g) F'ix 2 x 2 )) ixy*)] 

= Ti[F'(x 1 x* 1 )x 1 y* 1 ] Tr[F'ix 2 x* 2 )x 2 y* 2 ], 

which is provided that either y\ G x^ or y 2 G a;^ (or 
both). Likewise, for the term proportional to F'(a;ia;*) © 
J, 

Tr[(F'(xia;*) © /) (xy*)] = Tr^^ansDaru/J] Tr[x 2 y 2 1, 

which again is if either y\ G or ?/2 €E . Similarly, 
Tr[(/©F'(a;2x|)) (xj/*)] = and Tr[(7 © I) {xy*)] = 0. 
Combining the terms which make up F'{xx*), we see 
that Tt[F' ixx*)ixy*)] = whenever y = yi®y 2 satisfies 
yi G xj; or y 2 G a^-. 

Now an arbitrary element y £ x 1 - can be written as a 
linear combination of terms of the form x\ © y 2l y\ © x 2 , 
and yi © y 2 , with G x,^ . For each of these terms either 
the first or second component of the tensor product is in 
x^. Therefore Tr[F' ixx*)xy*] = for all y £ x^. □ 

Our main interest is in the function x > 
— Ti[xx* hi xx*], which is proportional to the usual von 
Neumann entropy of the matrix xx* . Letting F{t) = 
—tint, so that F'(t) = —(1 + hat), we have 

F'(ai © a 2 ) = -I - hx(ai © a 2 ) 

= -/©/- ln(ai) © I - I © ln(a 2 ) 
G span {J© J,F'(ai)©/ ! /©F'(a 2 )}. 

(Here we used the fact that ln(ai ©02) = ln(ai) ® 7 + 1© 
ln(a2).) Thus the hypotheses of Theorem Q] are satisfied, 
and so critical points of x — Tr[.xa;* lnxa;*] are closed 
under tensor products. 

Another important class of entropy functions are the 
p-norms: 

x H> ||sKE*||p = Tr[(xx*) p ]. 
Letting F(t) =t p ,so F'(t) = pt p -\ we have 

F'{ ai © a 2 ) = p(oi © a 2 f- 1 = -F'( ai ) © F\a 2 ). 

V 
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Again F(t) is in the form of Theorem Q] Thus for both 
the von Neumann entropy and the p- norms, the tensor 
product of critical points (subject to Tr[xx*] = 1) are 
again critical points. 



To calculate the derivative expressions above, we will ex- 
press the log function by its Taylor series: 



log(a + eb) = \og[I -(I -a- eb)] 



E 



(I -a- eb) 1 



III. SECOND DERIVATIVE OF THE 
VON-NEUMANN ENTROPY 

In this section we show that under the local commuta- 
tivity condition, if X\ £ K\ and x 2 € K% are nonsingular 
strong local minima of 

x H> — Tr[xx* log xx*] 

subject to Tr[xx*] = 1, then x\®x 2 is also a strong local 
minimum in K\ <g) K 2 ■ More precisely, we assume that if 
Hi G Ki is orthogonal to Xi, then XiX* and Xiy* commute. 
Throughout this section we will also assume without loss 
of generality that Tr[yy*} = 1. 

In this section we work with the normalized entropy 
function 



HO) := - Tr 



log 



Mr 



A point x is a strong local minimum of H on {i : 
Tr[x:r*] = 1} if and only if for every y orthogonal to 
x, the second directional derivative D 2 H(x) is positive. 

Lemma 4. Assume xx* and xy* commute. Then 

D 2 yR{x) = 2Tr [xx* logxx*] - 2Tr [yy* \ogxx*] 
-Tr[(xy* +yx*) 2 (xx*y 1 } , 

where the last trace is taken over the support of xx* . 

Proof. For convenience define a — xx* , b — xy* + yx*, 
and c = yy* , so that 



(x + ey)(x + ey)* = a + eb + 



e 2 c. 



Note that Tr[a] = Tr[c] = 1 and Tr[6] = 0, so Tr[(x 
ey)(x + ey)*] = 1 + e 2 . Then 



H(z + ey) 



Tr 



eb- 



e 2 c , a- 
s — log — 



eb ■ 



1 + e 2 1 + e 

Tr[(a + eb + e 2 c) log(a + eb + e 2 c)} 



1 



log(l 



Up to a second order in e this expression becomes 

H(x + ey) = -Tr [a log (a + eb + e 2 c)] 
- eTr [b log (a + eb)} + e 2 (l + Tr [a log a] - Tr [c log a] 



Therefore, the second order directional derivative can be 
expressed in the following way: 

d 2 

r Tr [a log (a + eb 



D 2 y R(x) 



de 2 



e 2 c)] 



2— Tr[61og(a + efe)l 
de 



e=0 



-2(1 + Tr[aloga] -Tr[cloga] 



Without loss of generality (see Lemma [TJ , in the last 
equality we assumed that a is invertible, so that for 
sufficiently small e also a + eb is invertible and there- 
fore I — a — eb < I. To calculate the derivative of 
Tr [6 log (a + eb)] at e = 0, we only need to take terms 
proportional to e in the expansion of the logarithm. As- 
suming a and b commute, 



de 



Tr[61og(a + e6)] 



CO 



e=0 



E 

n=l 



Tr [b 2 {I - 
Tr [b 2 ^ 1 ] . 



To calculate the second derivative of Tr[a log(a+efe+e 2 c)] 
we need only take the terms proportional to e 2 . Again 
assuming a and b commute and a is invertible, 



de- 



:Tr [a log (a + eb + e 2 c)] 



e=0 



2^Tr[a(/-a)"- 1 c] 

n=l 



--ElQ-frW'-r-vi 

n=2 v ' 

= -Tr [a^b 2 ] + 2Tr[c] = -Tr [a^b 2 ] + 2. 

Therefore -D 2 H(x) = 2Tr[aloga] - 2Tr[cloga] 
Tr^a" 1 ]. 



□ 



Corollary 1. Assume xx* and xy* commute. Then 
D 2 K(x) > if and only if 

|Tr[(xa*) -1 (a:j/*) 2 ]| + Tr[(xx*)~ 1 xy*yx*] 
< Tr[xx* log xx*] — Tr[yy* logxx*] , 

where (xx*)^ 1 is the inverse over the support of xx* . 

Proof. Expand (xy* + yx*) 2 into four terms, noting that 
xy* and yx* commute with (xx*) -1 . Then D y H(x) > 
if and only if 

Tt[(xx*) -1 (xy*) 2 ] + Tr[(xx*) -1 + (yx*) 2 ] + 2Tr[(xx*)~~ 1 xy*yx* 
< — 2 Tr[yy* logxx*] + 2 Tr[xx* logxx*]. 

The first two terms on the LHS are twice the real part of 
Tr[(xx*)~ 1 (xy*) 2 ]; the largest value of these two terms 
over all phases of y is 2 [Tr^a;*) -1 (xy*) 2 ] | . □ 

For convenience, denote the terms in Corollary [T] as 
follows: 



a(x,y) 
b(x,y) 
c(x) 
d(x,y) 



= \Tr[(xx*y 1 (xy*) 2 ] 
= Tr[(xx*) _1 xy* yx*] 
= Tr[xx* logxx*], 
= Tr[yy* logxx*], 



(5) 
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so DyK(x) > if and only if a + b < c — d. Each of these 
terms behaves nicely under tensor products: 

a{x\ ® £2,2/1 ® y 2 ) = a(xi, 2/1)0(2:2,2/2), (6) 
® x 2 , yi ® yi) = b(xi,y 1 )b(x 2 ,y 2 ), 

c(xi ® x 2 ) = c(xi) + c(x 2 ), 
® x 2 , yi ® y 2 ) = ^(^l, + d(x 2 ,y 2 ). 

We can also bound the size of some of these terms for any 
x and 2/ such that Tr[xj/*] = and Tr[xx*] = Tr[yy*] = 
1. First, we claim b G [0,1]. To see this, note that 
P = x* (xx*) -1 x is a projection matrix, so 

b = Tr[y*yx*(xx*)- 1 x] = \\Py*f, 

and < || fy* I < \\y*\\ = 1. Second, we claim a G [0,6]. 
To see this, note that without loss of generality (see 
Lemma|T| we can assume that xx* is invertible and there- 
fore positive definite, so (xx*) -1 / 2 exists and commutes 
with xy*, and so (xx*)^ 1 (xy*) 2 = ((xx*)~ 1 / 2 xy*) 2 . By 
C auchy- S chwar t z , 



0= Tr[((xx*)-^ 2 xy*) 2 } 

< Tr[((xx*)- 1/2 xy*)((xx*y 1/2 xy*)*} = b. 

Thirdly, we claim that c < 0, since it is the negative of 
the entropy function. We are now ready to prove the 
main result of this paper. 

Theorem 2. Suppose x\ and x 2 are strong local min- 
ima of x h- > — Tr[xx* log xx*] subject to Tr[xx*] = 1 and 
Xi G Ki, where Ki is a subspace. Further assume that 
for every yi G Ki, the matrices XiX* and Xiy* commute. 
Then x := Xi€5x2 is a strong local minimum in K\®K%. 

Proof. We show that under the hypotheses of the the- 
orem, if D 2 .K(xi) is positive for every j/j G x^, then 

D 2 H(x) is positive for every y G x . We break the proof 
into several cases depending on y. 

First, suppose y is a tensor product. 

Case 2/ = x\ <g> 2/2, J/2 € x 2 : Since y 2 G x 2 and X2 is a 
strong local minimum, we know that 

a{x 2 ,y 2 ) + b(x 2 ,y 2 ) < c(x 2 ) - d(x 2 ,y 2 ). 

It is also easy to see from the expressions ([5]) that 

a(xi,x%) = 6(xi,xi) = 1, c(xi) = g?(xi,xi). 

So, using the expressions for tensors in ©, we have 

a(x, y) + b(x, y) = a(x 2 ,y 2 ) + b(x 2 ,y 2 ) 
< c(x 2 ) - d(x 2 ,y 2 ) 
= c(x) - d(x,y). 

Thus the second directional derivative is positive for this 
choice of y. 

Case y = yi ® x%, y\ G x± : This case is similar to y = 
xi <8> 2/ 2 . 



Case 2/ = yi <8> 2/2, y% G Here we require the 

arithmetic-geometric mean inequality. For two terms 
ai,a 2 < 1, 



a\a 2 < 



f at + a 2 \ 2 1 
[—2— ) ^2 (0l+a2) ' 



In particular, a(x 1 ,x 1 )a(x 2 ,y 2 ) < a(x\,y\) + a{x 2l y 2 ) 
and similarly for b. Now, since yi G xj~, we have 
a(xi,yi) + b(xi,yi) < c(xi) - d(xi,yi). Combining these 
inequalities we get a(x, y) + b(x, y) < c(x) — d(x, y). 

Next, we consider cases where y is a linear combination 
of terms. 

Suppose y is in x^ <£> x 2 . In this case, we break y into 
two orthogonal pieces according to the projection matrix 
P = x*(xx*) -1 x. Let P l = x*(x l x*)~ 1 x l : this is the 
projection matrix onto the range of x*, which we denote 
i?(x*). Then P — P\ ® P 2 is the projection matrix onto 
the range R(x*) = <g> R(x 2 ). Write y as a direct 

sum: 

y = cm + /3v, 

where u* G R(x*) (so Pu* = «*), and Pv* = 0. The 
normalizations are chosen so that a G K and f3 G K 



satisfy a + P = 1, and ||u| 



= 1. We deal with 



the u and v components separately. 

Case y — u G (x± ® x^) fl R(x*): Here we have 



6(x, m) 
then 



|| Pu* 



= ||u*| = 1. Note that if yi is in xj-, 



Tr[xi Pi y*} = Tr[x;x*(x 4 x*) xiy*] = Tr [xiy*] = 0, 

so Piy* is also in x^. If we write u = ^ ■ j/ij <8> j/2j with 
2/ij G x^ 1 , so that 

11 = f tt = ^ Piyij (8) P 2 2/2i, 



then Pij/y is in x^ n R{x*), and it follows that it is in 
(xj; n P(x*)) ® (x^ H R(x* 2 )). Now perform a Schmidt 
decomposition of u with respect to this tensor space: we 
get 



where G x^ 1 n R(x*), Ti[uijU* k ] = Sjk, ctj > 0, and 
• a 2 = 1. Since tiy is in R(x*), we have 6(xj, uy) = 1. 
Since is in xj~, we know 

a(xj,-u lJ ) + b(xi,Uij) < c(xj) - d(xi,Uij), (7) 

and also < a(xi,Uij) < 1. Under this decomposition, 
wc also have 



d(x, u) = 2J ^(^(aiijtiij) + d(x 2 , U2,i))- 



(8) 
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Therefore, from and (JSJ, 

a(x, u) + b(x, u) 

< 2b(x,u) 

= y^ctj [b(xi,ui t j) + b(x 2 ,u li2 )} 

3 

< ^2aj[c(Xi) - d(xi,Uij)] + c(x 2 ) - <1{X2,U\,2) 

3 

= c(x) — d(x, u). 

Case y = »e x^ ® x£, Pv* = 0: We know that < 
a(x,v) < b(x,v) — \\Pv*\\ — 0, and so a(x,v) = b(x,v) — 
0. Perform a Schmidt decomposition of v with respect to 
the space x^ ®x 2 : 

3 

where v i3 g xj-, Tr[vijV* k ] = 8 jk and Y,j Pj = 1- Since 
Uij is in xf- , we have < a(xi, Vij) < b(xi, Vij) and 

< a(xi,Vij) + b(xi,Vij) < c(xi) - d(x l ,v l] ). (9) 

It follows quickly that a(x, v) + b(x, v) < c(x) — <i(a;, u). 
Next we deal with a combination of u and 
Case y £ Xi (g> x 2 : Write y = au + f3v, where u* E 

R(x*)), Pv* = 0, a 2 + P 2 = 1, and \uf = \\vf = 1. 
Then since uv* = uPv* = 0, we have 



* 2*i r>2 * 

yy = a uu + p vv , 



from which it follows that 



(10) 
(11) 



b(x, y) = a 2 b(x 1 u) + (3 2 b(x, v), 
d(x, y) = a 2 d(x, u) + (3 2 d(x 1 v). 



(In fact, b(x,u) = 1 and b(x,v) — 0.) Combining (fT0|) 
and with the results for u and v from the previous 
cases, we get 

a(x,y) + b(x,y) < 2b(x,y) 

= a 2 2b(x,u) + /3 2 2b(x,v) 

< a 2 [c{x) - d(x, u)} + /3 2 [c(x) - d(x, v)] 

= c(x) - d(x,y). 

Finally, we have the case where y is an arbitrary ele- 
ment of x . 

Case 1/6I 1 : Here y may be written in the form 

V = axi <E> y 2 + /3yi ® x 2 + jy 1 ', 

where yi £ x^ and y' £ x^ ® x 2 , with real constants 
satifying a 2 + f3 2 + -f 2 = 1. Expanding out terms of yy* 
and simplifying, we find that most terms disappear under 



trace: 

d(x,y) = a 2 [c{xi) + d(x 2 ,y 2 )} + /3 2 [c(x 2 ) + d(xi,yi)] 

+ l 2 d{x 1 y') 1 (12) 
b(x, y) = a 2 b(x 2 , y 2 ) + (3 2 b(x 1 ,y 1 ) + j 2 b(x, y'), (13) 

a(x,y) = \a?Tr[(x2xt)- 1 (x a yS) 2 ] (14) 

+/3 2 Tr[(x 1 x* 1 )- 1 (x 1 y* 1 ) 2 ] + 7 2 Ti[(xx*y 1 (x(y')*) 2 

The expression for d(x, y) requires the observation that 
Tr[xiy* logXiX*] — 0, because the first directional deriva- 
tive of D yi H(xi) is when x\ is a local minimum. The 
expression for a{x, y) is bounded as follows: 

a(x,y)= a 2 TrKx^y 1 (x 2 y 2 ) 2 ] 

+ (3 2 TrfaxD-Hxivl) 2 ] + 7 2 Trftxx')- 1 WT)*] 

<a 2 \Tr[(x 2 x* 2 )- 1 (x 2 y* 2 ) 2 }\ 
+ p 2 \Trl(x 1 x* 1 )-\x 1 yl) 2 }\+ 1 2 \Trl(xx*)- 1 (x(yY) 2 }\ 

= a 2 a(x 2l y 2 ) + /3 2 a(xi, yi) +~f 2 a(x,y'). (15) 

Combining ([l2|. ([I3| and ([Tg]!. we get a(x, y) + 6(x, y) < 
c(x) — d(x, y). □ 



IV. THE SECOND DERIVATIVE OF THE 
2-NORM 

In this section we focus on the 2-norm since its second 
directional derivative has an elegant analytical form. We 
prove that if K\ and K 2 are subspaces of matrices, at 
least one of which has dimension 2, and x\ € K\ , x 2 € K 2 
are strong local maxima of the 2-norm function 

x i — ^ Tr[(ra*) 2 ] 

subject to Tr[xa;*] = 1, then xi is also a strong local 
maximum in K \ ® K 2 . Since it is known that the 2-norm 
is not globally additive, this result sheds some light on 
the possibility that there exist functions that are locally 
additive while they are not globally additive. 
We will work with the normalized function 



H 2 (o:) :=Tr 



Tr[(a;x 



*\2l 



[Tr(xa:*)] 5 



As before, we consider (x + ey)(x + ey)* = xx* + e(xy* + 
yx*) + e 2 (yy*), where Tr[a;x*] = Ti[yy*] = 1 and Tr[xy* + 
yx*] = 0. Noting that 

[(x + ey)(x + ey)*} 2 = (xx*) 2 + 2exx*(xy* + yx*) 
+e 2 [2xx*yy* + (xy* + yx*) 2 } + 0(e 3 ), 

and that Tr[(x + ey)(x + ey)*} 2 = (1 + e 2 ) 2 , we have that 
up to second order in e, 

R 2 (x + ey) = Tv[(xx*) 2 } + 2eTr[xx* (xy* + yx*)} 

+ e 2 Tr[2xx*yy* + (xy* + yx*) 2 - 2(xx*) 2 }. 
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Then the first directional derivative of H2 is 

D v H a (aO = 2Tr[xx*(xy* +yx*)]. 

By considering iy as well as y, the condition _D y H 2 (:z;) = 
reduces to Tr[xx* xy*] = 0. The second derivative is 

D 2 y B. 2 (x) = 2Tr[{xy*+yx*) 2 ]+A r £i[xx*yy*]-A r £v[{xx*) 2 ]. 

For strong local maxima we expect L> 2 H 2 (:e) to be nega- 
tive. Expand (xy* + yx*) 2 into four terms: then 

Tr[(xy* +yx*f] = 2 ReTi[{xy*) 2 ] + Tr[x*xy*y], 

where Re(z) denotes the real part of z. The largest value 
of Re Tr[(xy*) 2 } over all choices of unit multiples of y is 
|Tr[(xy*) 2 ]|. In summary: 

Lemma 5. Define F(x, y) := 

-Tr[(a;x*) 2 ] + |Tr[(xy*) 2 ]| +Tr[xx*yy*] + Tr[x*xy*y]. 

Then x is a strong local maximum of the function x 1— ► 
Tr[(a;x*) 2 ], subject toTr[xx*] — 1, if and onlyifeveryy £ 
x , Tr[yy*] = 1 satisfies ^[xx* xy*] — andF(x,y) < 0. 

Denote the terms in F(x, y) as follows: 

a(x) := Tr[(a;a;*) 2 ] 
b(x,y) := |Tr[(^*) 2 ]| 
c(x,y) := Tr[xx*yy*] 
d(x,y) 



Tr[x* xy*y]. 



The Cauchy-Schwartz inequality implies that Tr[z^ 



< 



Tr[z*z] for any matrix z. Letting z — xy* , we conclude 
that < b(x, y) < c(x, y), d(x, y). Assuming a > b + c + d 
therefore implies that a > b,c, d. Since a = Tr[(a;ir*) 2 )] < 
Tr[xx*] = 1, we see that each of the terms a, 6, c, d axe in 
the range [0, 1]. Furthermore, each term is multiplicative 
under tensor products: a{x\ ® x%) — a(x\)a{x2), b[x\ <S> 
X2,yi ® 2/2) = b(xi,yi)b[x2,y2), and so on. 

From Section [H] we know that that tensor products of 
critical points of the 2-norm are again critical points. We 
can now say the same for local maxima. 

Lemma 6. Suppose x\ and X2 are strong local maxima of 
x 1— > Tr[(a;a;*) 2 ] subject to Tr[xx*] = 1 andxi € Ki, where 
either K\ or K 2 has dimension 2. Then x := x\ ® x 2 is 
a strong local maximum in K\ ® K 2 ■ 

Proof. Without loss of generality, assume K\ has dimen- 
sion 2, so Xi has dimension 1. Let y\ be an element of 
x\ and let y 2 j be elements of x 2 ■ Then every element of 
x in K\ <E) K2 is a linear combination of vectors of the 
form 2/1 £g) j/21 , x\ ® j/22, and y\ ® x 2 . First, we check that 
for each y of that form, F[x 1 y) is negative. 
Case y = y\ g y 2 \. Here 

F(x,y) = -a(x 1 )a(x 2 ) + b(xi,yi)b(x2,y2i) 
+c(xi,yi)c{x2,y2i) + d(x 1 ,y 1 )d(x 2 , 2/21)- 



Since a(x l ) > b(xt,yi) + c(x il y i ) + d(x ll y i ) 
and each term is nonnegative, it follows that 
a(x 1 )a(x 2 ) > b(x 1 ,y 1 )b(x2,V2i) + c(xi,yi)c(x 2 ,y2i) + 
d(xi,yi)d(x2,y 2 i), and so F(x,y) is negative. 
Case y = x\® y 22 : Here 

F(x, y) = -a(xi)a(xa) + a(xi)b(x 2 ,y 22 ) 
+a(xi)c{x 2 ,y 22 ) + a(xi)d{x 2l y 22 ). 

But -a(x 2 ) + b(x 2l y 22 ) + c(x 2 ,y 22 ) + d{x 2l y 22 ) < and 
a(x\) > 0, so F(x,y) is negative. 

Case y — y% <g> x 2 : Similar to y = X\ ® y 22 . 

Now consider a linear combination of the three ele- 
ments of x , say 

y = a(yi ® y 21 ) + /3(xi (g) j/22) + l(yi <8> x 2 ). 
In considering b(x, y), most terms disappear under trace: 

b(x,y) = \a 2 Tr[(x 1 y* 1 ) 2 }Tr[(x 2 y* 21 ) 2 } 

+ /3 2 Tr[(x!xD 2 ] Tr[(* 2 y 2 *2) 2 ] + 7 2 Tr^) 2 ] Tv[(x 2 x* 2 ) 2 } 

< \af b{x 1 ,y 1 )b(x 2 ,y 21 ) 
+ \/3\ 2 a{xi)b(x 2 , 1/22) + \l\ 2 b(x u yi)a{x 2 ). 

Likewise we have 

c(x,y) = \a\ 2 c(x 1 ,y 1 )c(x 2 ,y 21 )+ 

\/3\ 2 a(xi)c(x 2 ,y 22 ) + \j\ 2 c(x 1 ,y 1 )a(x 2 ), 

and similarly for d(x,y). Adding together, we conclude 
that 

F(x,y) < \a\ 2 [->a(xi)a(x2) + b(xi,yi)b(x2,y2i) 
+c(x 1 ,y 1 )c(x 2 ,y 21 ) + d(x 1 ,y 1 )d(x 2 ,y 21 )] 

+ \(3\ 2 a(xi) [-0(0:2) + b(x 2 ,y 22 ) + c(x 2 ,y 22 ) + d(x 2 ,y 22 )} 

+ \l\ 2 [-a{%i) + b( x i:Vi) + c(xi,yi) + d(xi,yi)] a(x 2 ). 

The I a 1 2 term is negative by the argument given in the 
case y = y\ ® y 2 \] the |/3| term is negative by the case 
y = Xi (8> y 22 ', and the |7| 2 term is negative by the case 
y = yi®X2- □ 



If both Ki and K 2 have dimension higher than 2, the 
linear combinations seem to be more difficult. 
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Appendix A: Affine parametrization 

Let x G K C C mxn and assume that xx* > 0, i.e. xx* 
invertible, and Trxx* = 1. When now study the second 
variation. Let y £ K and assume that Tr(xy* + ya:*) = 0. 
We then consider 

A(e) :=A + eB + e 2 C = {x + ey)(x + ey)* , 

A = xx*, B = xy* +yx*,C = yy* G H m . (Al) 

(Here H m is the real space of m x m matrices.) Let 
Ai(e), . . . , A m (e) > be the eigenvalues of A(e), as ana- 
lytic functions of e, (Rellich's theorem @). We can as- 
sume that these eigenvalues are arranged in the following 
order Ai(e) > ... > A m (e) > for small positive e. Let 
Ai(e) = A + eB. Arrange the analytic eigenvalues of 
A 1 (e) in the order /ii(e) > ... > /J m (e) > for small pos- 
itive e. Clearly, Aj(e) = Hi{e) + 0(e 2 ) for i = 1, . . . , m. 
The following result is known, and can be deduced from 
the arguments in Kato 5]. 

Lemma 7. Let A,B,C G H TO7 and denote A(e) — 
A + eB + e 2 C,Ai(e) = A + eB. Assume that 
Ai(e), . . . , A m (e) and /ii(e), . . . ,/i m (e) are analytic eigen- 
values of A{e) 1 A\{e) arranged in a nonincreasing or- 
der for small positive e. Then, there exists a unitary 
matrix U G C" iXm with the following two properties. 
First, UAU* = diag(Ai, . . . , A m ). Second, if we denote 
UCU* = F = [fij]^ j=1 then 

Ai(e) = + e 2 /« + 0(e 3 ) for i = 1, . . . , m. (A2) 

In the next proposition we use the above lemma to 
calculate the variation of S(A) = — Tr(AlogA) up to 
second order. 

Proposition 1. Let x,y G C m n and assume that 
Tr(xx*) = Tr(yy*) = l,xx* > 0,Ti{xy* + yx*) = 0. 
Define A(e),Ai(e) as in Eq. ([ATT) . Then 



S 



Me) 
TrA(e) 



= 5(A 1 (e))+e 2 Tr[(ra* - yy*)\ogxx*]+0(e 3 ) 



(A3) 



Proof. First recall that 



A(e) 



1 



-S(A(e)) +logTrA(e) 



TrA(e)J Tr A{e) 

- S(A(e)) + e 2 [- Tr(yy*)S(xx*) + Tr(yy*)} + 0(e 3 ). 

(A4) 

Next we claim 

m 

S(A(e)) = -J2^ogK(e) 

i=l 

m 

= - E(^( £ ) + ^) + /«^ 2 ) + 0(e 3 ) = 

1=1 

m mm 

- £ W (e) log W (e) - e 2 (^ log A, + £ /«) + 0(e 3 ) 
= S(A 1 (e)) - e 2 (TY((yy*)\og(xx*)) + Tr(yy*)) + 0(e 3 ). 



Combine this expression with the expression above it to 
deduce ([A3]) , □ 

Note that the expression Tr[(xx* — yy*)logxx*} can 
be either positive or negative. In the following we give a 
very simple reason why we can not ignore this term (i.e. 
use the affine approximation), which also yields a neces- 
sary condition xx* must satisfy if x is a local minimum. 

Assume that we have an affine subspace of the form 
A + tB, where Tr(A) = l,Tr(B) = 0. Here A = 
xx* , B = xy* + yx* on all y G K satisfying the condition 
Tr(-B) = and t arbitrary real. Let $ be the set of all 
A + tB such that A + tB > 0. Consider the function 
5(C) = -Tr(ClogC) where C G $. Our assumption 
that A is a critical point in $ for the S(C). Since 5(C) 
is strictly concave on $ it follows that A is a unique global 
MAXIMUM on $! So if A was a local minimum for the 
H(x),x G K, Tr(xx*) = 1 it follows that the correction 
term for e 2 that we have must be strictly positive . That 
is, if a; is a local min then 

Tr [(xx* — yy*) logxx*] 

= S{yy*) - S(xx*) + S(yy*\\xx*) > 

for all y G x 1 - (assuming the normalization Tr(yy*) = 1). 



Appendix B: A counter example to real additivity 
conjecture 

During the 2008 American Institute for Mathemat- 
ics workshop "Geometry and representation theory" 
Leonid Gurvits found a counterexample to the analogue 
of the additivity conjecture for real (rather than com- 
plex) matrices. In this appendix we generalize the coun- 
terexample to show that the additivity conjecture fails 
to hold for real spaces of orthogonal matrices containing 
the identity: there exist real subspaces K\ C R™ 11 " 11 
and K 2 C E m2X ™ 2 such R(Ki ® K 2 ) < H(ifi) + B.(K 2 ). 

K C K mxm i s called an orthogonal subspace if any 

7^ A G K is of the form aQ for some scalar a and an 
orthogonal matrix Q. Note that if K is an orthogonal 
subspace then for any orthogonal matrix Qq, the sub- 
space QqK is also an orthogonal subspace. By choosing 
Qa G K we can always assume that K contains the iden- 
tity matrix I m . 

The maximal size of an orthogonal subspace is given 
by the Radon-Hurwitz number, defined as follows. For 
to G N, let m = 2 b ■ a, with a odd, and let b = Ac + d 
where c is a nonnegative integer and d G {0, 1, 2, 3}. Then 
Radon Hurwitz number of m is 

Theorem 3. Let K C ]R mxm fr e an orthogonal sub- 
space. Then k := dim if < p(m), and this inequality 
is sharp for any m G N. More precisely, assume that 
I m G K and k > 2. Then K has a basis I m , Q\, . . . , Qk-i 
where Q\, . . . , Qk-i is a set of skew symmetric orthogo- 
nal anticommuting matrices, i.e. Q t Qj = —QjQi for any 

1 < * < j < k - 1. 
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Conversely, if Qi,...,Q k -i € R mxm are k - 1 
skew symmetric orthogonal anticommuting matrices then 
span(J m , Qi, . . . , Qk-i) is an k- dimensional orthogonal 
subspace. 

If Q G R mxm i s an orthogonal matrix, then all m sin- 
gular values of Q are equal to 1. Let Qi € R miXmi be an 
orthogonal matrix for i = 1, 2. Then for any real ai, &2, 
the singular values of aiQi are |ai|, and the singular val- 
ues of (aiQi) ® (a 2 Q 2 ) are all |ai<i2|- 

Suppose furthermore that mi, m.2 are even and Qi, Q 2 
are skew symmetric orthogonal matrices. Then aiQi has 
^ eigenvalues equal to Ojv/^T and — aj\/^T for i = 1,2 
repectively. Furthermore, (aqQ) <8> (a2<3) is a real sym- 
metric matrix with mi 2 m2 eigenvalues equal to aict2 and 
mi 2 m2 eigenvalues equal to — aia 2 . 

Theorem 4. Let K C K mxm fr e an orthogonal subspace. 
Then U(K) = logm. 

Suppose furthermore that mi,m2 are eueK a«d ^ C 
R" liXmi are orthogonal subspaces of dimension two at 
least for i = 1,2. Then 

H(#i ® ff 2 ) < log = log(mim 2 ) - log 2 

= H(X 1 )+H(X 2 )-log2 (Bl) 



/n particular, the additivity conjecture does not hold for 
real subspaces of matrices. 



Proof. Since any matrix x e K is of the form aQ for 
some orthogonal Q it follows that if Tr(xx T ) = 1 then 
the singular values of x are all equal to ^. Hence H(x) = 
logm and H(K) — logm. 

Assume now that K\ , K 2 are orthogonal spaces of di- 
mension two at least. Without loss of generality we may 
assume that I mi ,Qi S R m i xm i and I m2 ,Q 2 e l m2Xm2 , 
where Qi, Q 2 are orthogonal. Hence I mi m 2 = Imi ® ^m 2 
and Q\®Q 2 are both in ifi ® K 2 . Recall that Q\®Q 2 
is a symmetric matrix which has mi 2 m2 eigenvalues equal 
to 1 and —1 respectively. Hence Qi <£> Q2 + I mi m 2 is 
a nonnegative definite real symmetric matrices which 
has mi 2 " 12 eigenvalues equal to 2 and respectively. Let 

x = (t^)HQi ®Q2 + I mim2 )- Then Tr(xx T ) = 1 
and x has TOl 2 TO2 nonzero singular values all equal to 

(^)*- Hence H(#i ® K 2) < H(s) - log(™). □ 
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